Exploring Stylistic Variation with Age and Income on Twitter
نویسندگان
چکیده
Writing style allows NLP tools to adjust to the traits of an author. In this paper, we explore the relation between stylistic and syntactic features and authors’ age and income. We confirm our hypothesis that for numerous feature types writing style is predictive of income even beyond age. We analyze the predictive power of writing style features in a regression task on two data sets of around 5,000 Twitter users each. Additionally, we use our validated features to study daily variations in writing style of users from distinct income groups. Temporal stylistic patterns not only provide novel psychological insight into user behavior, but are useful for future research and applications in social media.
منابع مشابه
CodeX: Combining an SVM Classifier and Character N-gram Language Models for Sentiment Analysis on Twitter Text
This paper briefly reports our system for the SemEval-2013 Task 2: sentiment analysis in Twitter. We first used an SVM classifier with a wide range of features, including bag of word features (unigram, bigram), POS features, stylistic features, readability scores and other statistics of the tweet being analyzed, domain names, abbreviations, emoticons in the Twitter text. Then we investigated th...
متن کاملDo You Smile with Your Nose? Stylistic Variation in Twitter Emoticons
On the surface, emoticons seem to convey emotional stances, so we expect smiles and frowns to be used differently from one another. But there are other systematic patterns of variation in emoticons that are not easily described by terms like “friendly” or “sad”. This paper analyzes the 28 most frequent emoticons in use in American English tweets. People vary in their use of eyes, mouth shape, f...
متن کاملAuthor Profiling of Twitter Users: Notebook for PAN at CLEF 2015
In this paper, we focused on profiling authors on age, gender, and five personality traits. The corpus consists of anonymized twitter posts categorized into 4 different languages. Our proposed approach was to use a combination of tfidf, function words, stylistic features, and text bigrams, and used an SVM for each task.
متن کاملLearning Age and Gender of Blogger from Stylistic Variation
We report results of stylistic differences in blogging for gender andagegroupvariation.Theresultsarebasedontwomutually independent features. The first feature is the use of slang words which is a new concept proposed by us for Stylistic study of bloggers. For the second feature, we have analyzed the variation in average length of sentences across various age groups and gender. These features ar...
متن کامل"How Old Do You Think I Am?" A Study of Language and Age in Twitter
In this paper we focus on the connection between age and language use, exploring age prediction of Twitter users based on their tweets. We discuss the construction of a fine-grained annotation effort to assign ages and life stages to Twitter users. Using this dataset, we explore age prediction in three different ways: classifying users into age categories, by life stages, and predicting their e...
متن کامل